Messenger API Design Evaluation and Latency Budget

Understand how we achieve the non-functional requirements for the Messenger API.

Introduction#

Modeling an adequate API is a complex task that involves fine-tuning different technical dimensions. These technical dimensions have different parameters to be optimized, and their tradeoffs must be considered. In this lesson, we’ll discuss how non-functional requirements can be achieved and what optimization decisions we need to take. We’ll also discuss the latency and response time of our proposed API of Messenger.

Non-functional requirements#

The following section discusses how the messenger API meets the non-functional requirements:

Consistency#

A chat application would require rolling out new features frequently, which needs periodic API versioning. We keep the endpoints, error messages, URL patterns, and relevant data entities uniform to achieve consistency. Moreover, during a chat, the messages must be delivered in sequence; therefore, we utilize the FIFO (First In, First Out) messaging queue with strict ordering.

Point to Ponder

Question

In the context of consistency, how do we make sure that all group participants see the messages in the same order?

Hide Answer

To ensure consistency in message ordering in a group chat, we should consider the following approaches collectively:

  • Use a single database: Store all messages in a single database to ensure that all participants access the exact source of data, which helps maintain consistency. There will be redundant databases to ensure the availability and reliability of the system.
  • Unique sequence number: Assigning a unique sequence number to each message can help ensure that messages are delivered and displayed in the correct order. This can be achieved through a sequencer. Some sequencers enable us to infer causality or have wall-clock time as part of their sequence, which makes it possible to monotonically order messages. See sequencer design for details.
  • Messaging queue: Utilizing a messaging queue ensures that messages are processed in the correct order. It also helps to prevent messages from being lost or duplicated.

Availability and disaster recovery#

To enhance availability, we divide different responsibilities among various services and install redundant servers to avoid overburdening a single service. Furthermore, cascading failures are avoided using circuit breakers. We also provide sufficient chat servers and corresponding WebSocket managers to handle their mapping with clients. Therefore, even if a WebSocket connection fails with one chat server, the session is recreated, possibly with a different server. Moreover, the messages are stored on highly consistent and efficient database clusters, like HBase and MyRocks, which provides high availability and reliability via region replication.

Note: HBase is an open-source distributed key-value store based on HDFS. It is known for its consistent read and write operations, scalability, and support for MapReduce jobs. MyRocks is an open-source database introduced by Facebook which integrates RocksDB as a MySQL storage engine.

Security#

For authentication and authorization, a user can log in to Messenger using the username and password they chose during the signup phase. Based on these credentials, a JWT token is generated, which is used for the duration of that session only. Apart from authentication and authorization, we provide an end-to-end encryption mechanism to secure communication. This secure communication uses secret keys (symmetric encryption) that are exchanged using asymmetric cryptography primarily based on the Signal protocol.

Low latency#

It is imperative to achieve low latency for a chatting application. That is why, for real-time chat, we choose a WebSocket connection, even if it consumes higher resources than a stateless HTTP connection. However, using WebSocket for sending media files may cause an increase in latency due to its lack of a multiplexing feature. To mitigate this issue, we upload the media files via a separate HTTP connection. Also, because it's possible that viral media files are frequently shared among users, we avoid storing the same content multiple times using hashing. Furthermore, we employ read-through caching for storing the messages, which enhances the performance of our API since it allows read and write operations on the database via caching.

Achieving Non-Functional Requirements

Non-Functional Requirements

Approaches

Consistency

  • Uniformity across different versions
  • Using FIFO queue with a sequence for assigning IDs to messages

Availability

  • Decoupling of services
  • Utilizing the region-replication

Security

  • End-to-end encryption


Low latency

  • Uploading media files via HTTP instead of WebSocket
  • Avoiding duplicate storage of media files
  • Employing the read-through caching approach

Latency budget of the Messenger API#

We are utilizing two application layer protocols—HTTP and WebSocket—in the Messenger API. Therefore, we have two operations that impact the response time of our API. One of these operations is uploading the media file via HTTP, and the other is real-time chat via WebSocket. We have discussed the response time of the file API operation while designing its respective API. In this section, we focus on the response time of the one-to-one communication between two clients.

Assumptions: We make the following assumptions before estimating the latency:

  1. The text message sizes are usually smaller than 1 KB, which means that we can roughly use the same time as RTT that we assumed in the back-of-the-envelope latency calculations.

  2. We don't consider the latency of media files, because they are sent and received through HTTP instead of WebSockets. For media files, we can refer to the estimates of file upload API.

  3. The WebSocket connection is already established, which takes a maximum of 275.9 ms, and the receiver is online.

Let's calculate the end-to-end delivery time of our API in the following steps:

  1. The sender sends the message to its corresponding chat server in 35 ms. This is half the RTT we estimated in the back-of-the-envelope calculation.

  2. The chat server processes the message in 0.125 ms to decide whom the message is intended for.

  3. The message is delivered to the messaging queue in 10 ms, assuming that messaging queue resides in another zone.

  4. The messaging queue takes around 0.125 ms to process the message, place it in the queue, and decide which server the message needs to be forwarded to.

  5. The message is delivered to the chat server to which the receiver is connected in 10 ms.

  6. The chat server of the receiver processes the message and forwards it to the receiver in 0.125 ms.

  7. The message is delivered to the receiver from the chat server in 35 ms.

  8. The receiver sends the acknowledgment to the chat server, which is forwarded to all other components involved in the communication.

So the process of delivering a message from the sender to the receiver takes approximately 180.25 ms, as shown in the following slides.

Created with Fabric.js 3.6.6
The chat server 01 inform the sender in 35 ms that the message has been delivered to the receiver

1 of 11

Created with Fabric.js 3.6.6
The chat server 01 inform the sender in 35 ms that the message has been delivered to the receiver

2 of 11

Created with Fabric.js 3.6.6
The chat server 01 inform the sender in 35 ms that the message has been delivered to the receiver

3 of 11

Created with Fabric.js 3.6.6
The chat server 01 inform the sender in 35 ms that the message has been delivered to the receiver

4 of 11

Created with Fabric.js 3.6.6
The chat server 01 inform the sender in 35 ms that the message has been delivered to the receiver

5 of 11

Created with Fabric.js 3.6.6
The chat server 01 inform the sender in 35 ms that the message has been delivered to the receiver

6 of 11

Created with Fabric.js 3.6.6
The chat server 01 inform the sender in 35 ms that the message has been delivered to the receiver

7 of 11

Created with Fabric.js 3.6.6
The chat server 01 inform the sender in 35 ms that the message has been delivered to the receiver

8 of 11

Created with Fabric.js 3.6.6
The chat server 01 inform the sender in 35 ms that the message has been delivered to the receiver

9 of 11

Created with Fabric.js 3.6.6
The chat server 01 inform the sender in 35 ms that the message has been delivered to the receiver

10 of 11

Created with Fabric.js 3.6.6
The chat server 01 informs the sender in 35 ms that the message has been delivered to the receiver

11 of 11

Assuming that the initial connection takes 275.9 ms275.9\ ms to establish, it will take a total of 456.15 ms456.15\ ms to deliver a message, as shown below:

180.25 (message delivery)+275.9 (connection establishment)=456.15 ms180.25\ (message\ delivery)+ 275.9\ (connection\ establishment) = 456.15\ ms

It is important to realize that the equation above that 456.15 ms456.15\ ms includes the latency of 275.9 ms275.9\ ms for establishing a WebSocket connection, which will be done only once at the start of the chat session. Also, note that we have assumed the maximum possible times for communication between two clients. For example, we assumed the messaging queue and the chat servers were in different regions. However, these components are likely in the same region or zone.

In this lesson, we described how we achieve the nonfunctional requirements of our API for a chat application like Messenger. At the end of the lesson, we also estimated the time it takes for a message to be delivered to the receiver and get an acknowledgment back.

API Model for Messenger Service

Requirements of the Google Maps API